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RESEARCH  SUMMARY 


An  algorithm  for  ridge  regression  (BREX)  is  presented  which  can  be  used 
in  conjunction  with  Grosenbaugh's  REX  (1967),  a  linear  regression  program 
with  combinatorial  screening.  The  algorithm  uses  either  REX  punched  matrix 
input,  or  raw  data  with  suitable  transformations,  to  estimate  ridged  (biased) 
regression  coefficients.    A  modification  of  Marquardt's  (1970)  criterion  is 
proposed  for  selecting  the  best  biasing  level  (k  value).    Application  of  the 
algorithm  is  demonstrated  on  two  widely-used  data  sets  (Hoerl  and  Kennard, 
1970b;  Draper  and  Smith,  1966). 
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INTRODUCTION 


Multiple  linear  regression  is  one  of  the  most  commonly  used  statistical  techniques 
in  forestry.     Regression  analysis  is  used  routinely  to  develop  growth  and  yield  models 
and  to  estimate  individual  tree  and  stand  volumes.     This  process  can  be  divided  into 
three  components:  model  hypothesis,  screening  of  variables  in  the  hypothesis,  and  pa- 
rameter estimation  for  the  final  model. 

UTiile  many  criteria  have  been  proposed  for  selecting  the  best  subset  of  variables, 
screening  all  combinations  does  guarantee  that  the  best  set  is  actually  found  (Hocking 
1976) .     Grosenbaugh  (1967)  developed  a  program,  REX,  that  selects  the  independent  var- 
iables by  screening  all  combinations  and  picking  that  combination  which  minimizes  the 
relative  mean-squared  error  (relative  mean-squared  error  is  defined  as  the  ratio  of  the 
regression  mean-squared  error  to  the  variance  of  the  dependent  variable) .     REX  then 
uses  the  least-squares  criterion  to  estimate  the  regression  parameters.     These  esti- 
mates are  the  best  linear  unbiased  estimates   (BLUE's)   for  the  model,  but  in  the  presence 
of  multicollinearity  they  may  be  too  imprecise  to  be  useful.     Multicol linearity  exists 
when  at  least  one  independent  variable  is  highly  correlated  with  another,  or  with  a 
linear  combination  of  other  independent  variables.     The  high  degree  of  imprecision  is 
due  to  the  fact  that  multicollinearity  between  independent  variables  causes  the  correla- 
tion matrix  to  approach  singularity  and  the  variances  of  the  parameter  estimates  to 
approach  infinity. 

When  faced  with  multicollinearity,  the  analyst  may  be  tempted  to  eliminate  those 
independent  variables  causing  the  problem  by  consciously  removing  them  from  the  model 
or  by  using  a  screening  method,  such  as  stepwise  regression,  which  has  a  tendency  to 
exclude  such  variables.     However,  this  may  destroy  the  usefulness  of  the  model  by 
eliminating  relevant  independent  variables. 

Another  approach  for  dealing  with  multicollinearity  is  to  apply  ridge  regression 
methodology.     A  ridged  solution  produces  biased  estimates  of  the  model  parameters,  but 
it  can  also  greatly  reduce  the  variances  of  the  parameters  so  that  the  sum  of  their 
mean-squared  errors  (MSE)  is  less  than  that  for  the  least-squares  solution.     By  com- 
bining ridge  regression  with  a  least-squares  program  that  screens  all  combinations,  it 
is  possible  to  first  select  the  set  of  variables  which  minimizes  MSE,  and  then  minimize 
any  resulting  multicollinearity  problems  by  ridging  the  parameter  estimates. 

For  those  interested  in  using  a  model  for  predictive  purposes,  the  ridged  model 
offers  two  advantages  over  the  least-squares  model.     First,  the  ridged  model  has  a 
lower  MSE  value  than  the  least-squares  model;  this  increases  confidence  in  predictions 
(Hoerl  and  Kennard  1970a) .     Second,  the  ridged  coefficients  can  often  meet  theoretical 
or  practical  constraints  on  the  model  when  the  least-squares  estimates  are  of  wrong 
sign  or  magnitude  (Hoerl  and  Kennard  1970b).      The  latter  feature  is  particularly  im- 
portant if  the  model  is  to  be  used  outside  of  the  range  of  the  original  data  set,  as  is 
often  the  case  in  simulation. 

For  users  primarily  interested  in  interpreting  the  regression  coefficients,  the 
presence  of  multicollinearity  causes  serious  problems.     Kmenta  (1971)  has  shoun  that 
the  least-squares  estimates  are  highly  imprecise  under  high  multicollinearity.  By 
using  ridge  regression,  the  precision  of  the  regression  coefficients  is  increased  and 
interpretation  becomes  more  dependable. 

This  report  presents  a  computer  program,  BREX  (biased  REX) ,  which  computes  ridge 
estimates  from  the  moment  matrix  output  of  REX.     For  those  who  do  not  use  REX,  BREX  can 
also  be  run  using  raw  input  data. 
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PRINCIPLES  OF  RIDGE  REGRESSION 


Ridge  Regression 

The  multiple  linear  regression  model  is  commonly  written  as, 

Y  =  XB  +  e  (1) 

where  Y  is  the  n  vector  of  observations  on  the  dependent  variable,  X  is  the  nxp  matrix 
of  observations  on  the  independent  variables,  B  is  the  p  vector  of  parameters,  and  e  is 
the  n  vector  of  errors.     It  is  assumed  that  X  has  full  rank  p  and 

E{e)  =  0       Eiee')  =  o^I  (2) 

For  convenience,  it  is  also  assumed  that  all  the  variables  have  been  standardized  so 
that  the  sample  means  are  zero  and  the  sums  of  squares  are  one.     The  moment  matrix, 
X'X,  then  consists  of  the  sample  correlation  coefficients  (Draper  and  Smith  1966). 


The  least-squares  estimates  are  given  by 


=  (X'X)   ^X'Y  (3) 


with  variance-covariance  matrix, 

COV(B.,B  .)  =  o^(X'X)~^  (4) 

The  diagonal  elements  of  the  inverse  of  the  correlation  matrix  are  labeled  the  variance 
inflation  factors  (VIF)  by  Marquardt  and  Snee  (1975)  since  the  variances  of  the  esti- 
mated parameters  are  directly  proportional  to  these  factors.     If  the  independent  var- 
iables are  uncorrelated,  then 

(X'X)  =  I  ^  (Z'X)"^  (5) 


and  the  VIF's  are  all  equal  to  one.     In  the  presence  of  high  correlations,  however, 
some  of  the  VIF's  become  much  larger  than  one  and  the  corresponding  estimates  are  very 
imprecise  (Marquardt  1970).     Because  the  least-squares  estimates  are  the  BLUE's  for  the 
model,  the  precision  can  only  be  improved  by  using  nonlinear  or  biased  estimators. 
Nonlinear  estimation  tends  to  be  computationally  difficult.     Among  the  alternatives  to 
nonlinear  estimation  are  ridge  regression,  principal  components,  and  shrinkage  esti- 
mators.    Ridge  regression  is  the  most  popular  biased  method,  probably  because  its 
relationship  to  least-squares  and  its  statistical  properties  are  clearly  defined. 
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Hoerl  (1962)  first  suggested  that  adding  a  small  positive  constant  k  to  the  diago- 
nal elements  of  Z'X  would  reduce  the  variances  of  the  coefficients.     The  ridge  estimates 
are 

B*  =  (X'X  +  kI)~^X'Y,  k>0  (6) 
with  variance-covariance  matrix, 

COViB*,B*)  =  o^iX'X  +  kiy^X'X(X'X  +  kl)~'^ .  (7) 

Note  that  for  k  =  0  the  ridge  estimates  are  equal  to  the  least-squares  solution. 

Hoerl  and  Kennard  (1970a)  have  shown  that  although  the  ridge  estimates  are  biased, 
there  always  exists  a  k>0  such  that  the  sum  of  MSE's  of  the  parameter  estimates  is 
lower  than  for  least- squares .     Hocking  and  others  (1976)  have  concluded,  however,  that 
ridge  regression  is  not  necessarily  superior  to  least-squares  when  the  number  of  inde- 
pendent variables  is  less  than  three.     Vinod  (1976)  and  Marquardt  and  Snee  (1975)  have 
argued  that  minimal  MSE  is  a  better  criterion  than  BLUE  for  judging  the  quality  of 
regression  estimators.     An  estimator  that  is  close  to  the  true  value  with  high  probabil- 
ity, though  biased,  is  thought  better  than  an  unbiased  estimator  which  has  a  low  proba- 
bility of  being  near  the  true  value. 


Selecting  a  "k"  Value 

Although  ridging  can  always  reduce  MSE  for  nonorthogonal  regressors,  no  closed- 
form  solution  exists  for  the  k  value  that  minimizes  the  summed  MSE.     Hoerl  and  Kennard 

(1970b)  proposed  examining  the  ridged  parameter  estimates,  B* ,  over  a  range  of  k  values 

(the  ridge  trace)  and  selecting  one  where  B*  stabilizes.     Since  we  are  interested  mainly 
in  the  variances  of  these  estimates,  this  approach  is  indirect  as  well  as  very  subjec- 
tive.    Stability  is  a  matter  of  degree  since,  as  k  goes  to  infinity,  5*  approaches  zero. 

The  situation  is  simplified  if  we  transform  the  variables  to  their  principal 
components  (linear  combinations  of  the  variables  which  are  mutually  orthogonal)  to 
obtain  a  generalization  of  ridge  regression.     Following  Hilt  and  Seequist  (1977),  there 
always  exists  an  orthogonal  transformation  matrix  P  such  that, 

P'X'XP  =  D  and  P'P  =1.  (8) 

The  columns  of  P  are  the  eigenvectors  of  X'X,  and  D  is  the  diagonal  matrix  of  corres- 
ponding eigenvalues.  The  least-squares  regression  coefficients  on  the  principal  com- 
ponents are  given  by 

A  =  P"^  P'X'I  C9) 

and  the  generalized  ridge  solution  by 

h  =  {V  +  kY^p'X'y  (10) 
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where  K  is  a  diagonal  matrix  with  nonnegative  elements  k  . .     The  MSE  of  each  element 

a.  of  A*  is    minimized  by  setting  k.  =  o^/a^  (Hoerl  and  Kennard  1970a),     These  optimal 

^  ^  ^ 

A: . ' s  depend  on  the  unknown  parameters  and  a.  which  can  be  replaced  with  the  con- 

sistent  estimates,        and  (a.)^.     However,  this  substitution  destroys  the  minimum  MSE 

property  (Vinod  1976) .     Since  these  least-squares  estimates  are  generally  poor  due  to 
ill  conditioning,  Hoerl  and  Kennard  (1976)  suggest  that  they  only  be  used  as  a  starting 

point  to  estimate  initial  values  of  the  k.^s.     These  /c . ' s  are  then  used  to  find  a  new 

^  ^  ^ 

and  the  iteration  continues  until  convergence  of  the  k.^s  to  stationary  values  occurs. 

Hemmerle  (1975)  presents  a  closed-form  solution  to  this  iterative  process,  but  finds 
that  it  often  leads  to  an  unacceptable  increase  in  the  error  sum  of  squares  (ESS) . 
He  proposes  allocating  the  allowable  inflation  of  ESS  proportionally  to  each  principal 
component.     This  method  brings  those  a.'s  previously  set  to  zero  back  into  the  solu- 
tion, though  they  obviously  contribute  the  most  to  variance  inflation  (Hocking  and  others 
1976).     Until  a  more  satisfactory  procedure  for  constraining  the  increase  in  ESS  is  found, 
generalized  ridge  regression  lacks  practicality. 

With  ordinary  ridge  regression  (k .  =  constant),  one  could  average  the  generalized 

k.^s  to  find  a  single  k  value,  the  harmonic  mean  being  suggested  (Hoerl  and  others  1975), 

to  minimize  the  effects  of  small  a.'s  which  have  little  predictive  value.     This  approach 
leads  to  the  estimator,  ^ 

k  =  po^/A'A  =  po^/B'B  (11) 

where  p  is  the  number  of  independent  variables  (the  denominator  is  the  length  of  the 
parameter  vector  which  is  unchanged  by  the  switch  to  principal  components) .     The  same 
estimator  arises  by  minimizing  the  summed  MSE  of  the  parameter  estimates  for  the  special 
case  where  X'X  =  I.     Again,  the  least-squares  estimates  are  used  as  a  starting  point 

and  then  the  iteration  is  continued  on  (S*) 'S*  until  the  k  value  converges. 

This  procedure  was  included  as  an  option  in  an  earlier  version  of  BREX.  We  found 
that  the  initial  least-squares  estimates  often  produced  k  greater  than  one  and  conver- 
gence to  infinity  occurred.     Mallows'    (1973)  estimator, 

k  =  po^/[B'B  -  po^]  (12) 

was  also  considered  since  it  is  unbiased,  but,  as  it  is  always  larger  than  that  of 
Hoerl  and  others  (1975),  the  same  problem  would  occur. 

The  most  promising  method  for  selecting  a  k  value  was  that  of  Marquardt  (1970). 
He  sought  a  k  value  such  that  the  maximum  VIF  was  between  10  and  1,  and  closer  to  1. 
Obviously,  VIF's  less  than  1  are  undesirable  because  1  is  the  lower  limit  attained  by 
a  perfectly  orthogonal  system.     BREX  calculates  both  minimum  and  maximum  VIF's,  the 
former  being  a  secondary  criterion  which  preferably  would  be  not  much  less  than  1. 
Defining  a  desirable  range  such  as  this  allows  consideration  of  the  increase  in  the 
ESS  as  well. 

A  further  advantage  of  Marquardt ' s  criterion  is  that  k  is  nonstochastic ,  depending 
only  on  fixed  X  (Obenchain  1975).     This  property  is_.required  if  the  equations  derived 
by  Hoerl  and  Kennard  (1970a)  for  £"(5*)  and  COV(B*,  Si) , are  to  be  valid.     The  estimators 

previously  discussed  are  stochastic,  depending  on  Y,  and  therefore,  technically,  their 
moments  are  unknown. 
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PROGRAM  OPERATION 


Program  Availability 

BREX  is  written  in  FORTRAN  IV  and  is  operational  on  a  CDC  6400  and  a  CDC  CYBER 
73.     A  copy  of  the  program  can  be  obtained  by  writing: 

PROGRAM  BREX 

Renewable  Resources  Evaluation  Research  Work  Unit 
507  -  25th  Street 
Ogden,  Utah  84401 

Features  of  BREX 

BREX  was  specifically  designed  to  be  used  in  conjunction  with  Grosenbaugh' s  REX. 
For  this  reason,  it  uses  as  input  the  moment  matrix  punched  by  REX.     To  allow  access 
by  other  users,  subroutines  MTRX  and  TRNX  were  added  for  the  conversion  of  raw  data  to 
REX  matrix  form. 

Control  card  input  is  assumed  to  be  from  cards,  but  the  data  matrix  or  observations 
may  be  read  from  tape.     If  raw  data  are  used,  punched  output  identical  to  that  of  REX 
can  be  requested  for  later  reuse.     For  a  large  number  of  observations,  efficiency  is 
greatly  increased  by  direct  use  of  the  moment  matrix. 

For  each  problem  to  be  solved,  the  user  can  select  ridge  output  for  up  to  10 
specified  k  values  or  accept  the  default  selection  of  20  values  over  the  range  0.005 
to  1.0.     Like  REX,  the  user  can  specify  the  use  of  the  moment  matrix  from  the  previous 
problem  for  any  problem  except  the  first  of  a  run.     In  this  way,  many  possible  models 
can  be  checked  on  a  single  run  while  only  reading  in  the  data  once. 

The  first  page  of  the  output  gives  the  problem  title  and  a  statement  of  parameters. 
For  raw  data,  the  first  two  and  the  last  transformed  observation  vectors  are  displayed 
as  a  check  against  errors  in  TRNX  or  an  incorrect  number  of  observations. 

The  second  page  presents  the  least-squares  solution  with  the  standardized  coeffi- 
cients, and  the  normal  coefficients  and  their  variances.     Also  displayed  is  the  estimated 

_2 

MSE,  ESS,  minimum  and  maximum  VIF's  and  R  .     Succeeding  pages  show  the  ridge  solution 
for  each  k  value,  giving  statistics  similar  to  those  for  the  least-squares  solution. 
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Control  Cards  for  the  Use  of  BREX 


Field  Function 


Column      Name  Format 


Card  1 

Problem  identification 
Label  for  punched  output 
(from  raw  data  only) 

Card  2 

Total  number  of  variables  in  REX  input  matrix  or 
number  of  variables  after  transformation  of  raw 
data 

Number  of  independent  variables  for  this  problem 

Number  of  k  values  to  be  read  in 
(0  =  default  selection) 

Use  data  from  previous  problem? 
(Yes  =  1,  No  =  0) 

Should  regression  be  performed  without  intercept?-^ 
(Yes  =  1,  No  =  0) 

Are  input  data  from  tape?^ 
(Yes  =  tape  number.  No  =  0) 

Are  the  observations  to  be  weighted?-'- 
(Yes  =  1,  No  =  0) 

Number  of  observations  for  raw  data  input 
(=  0  for  REX  matrix  input) 

Number  of  variables  (excluding  weight)  for 
each  raw  observation  vector 
(=  0  for  REX  matrix  input) 

Tape  number  for  punched  output^ 
{-  0  for  REX  matrix  input) 


5-76 
77-80 


1-5 


16-20 


21-25 


26-30 


31-35 


NAME 
NAME 


NNT 


6-10  NGX 


11-15  NNK 


IPD 


I  MO 


ICT 


IW 


36-45  NOB 


46-50  NVAR 


18A4 
A4 


51-55 


IPO 


15 


15 


15 


15 


15 


15 


15 


110 


15 


15 


(con. ) 


-^Unlike  REX,  these  options  have  no  effect  on  REX  matrix  input  except  to 
provide  correct  labeling  on  the  first  page  of  output.     Also,  they  cannot  be 
used  to  change  these  specifications  for  a  problem  using  data  from  a  previous 
one.     It  is  important  to  repeat  the  same  specifications  for  previous  data 
problems  so  that  the  output  will  be  correctly  labeled. 

^The  first  statement  of  the  main  program  specifies  tape  7  for  the  punch 
and  tape  10  for  data  input.     If  other  unit  numbers  must  be  used,  simply  change 
the  numbers  in  this  statement. 
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Control  Cards  for  the  Use  of  BREX  (Cont.) 


Field  Function 


Column      Name  Format 


Card  5 

Punch  Z's  and  a  Y  in  the  columns  whose  ordinals  1-50  NX  50A1 

are  the  subscripts  of  the  independent  variables 

and  the  dependent  variable,  respectively 

Card  4 

Up  to  10  k  values  to  be  used,  in  order  of  ascending         1-80  XK  10F8.4 

magnitude  (blank  for  default  selection) 

Cards  5  and  6 


Variable  format  for  raw  data  input 
(blank  for  REX  matrix  input) 


1-80 
/  1-80 


FMT 


20A4 
/20A4 


These  six  cards  are  followed  by  the  input  data  if  they  are  from  cards. 
This  sequence  of  cards  (1-6  plus  data  cards,  if  any)  is  repeated  for  each 
problem  in  the  run.     After  the  last  problem  set,  a  card  punched  "DONE" 
in  columns  1-4  signals  the  end  of  the  run. 
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Subroutine  TRNX 


This  is  a  user-supplied  subroutine  that  generates  regression  variables  from  raw 
data  using  FORTRAN  statements.     The  input  variables,  D(I),  I  -  1,  NVAR,  are  read  accord- 
ing to  the  variable  format  on  control  cards  5  and  6  of  the  problem  set.     The  output 
variables,  XUJ^  I  ^  I,  NNT,  and  the  weight  W,  then  are  used  to  form  the  Z'J  matrix. 
TRNX  is  called  once  for  each  observation  to  assign  values  for  all  the  independent  and 
dependent  variables  to  be  used  for  all  the  problems  of  a  run.     For  each  problem  set, 
control  card  3  specifies  the  dependent  variable  and  regressor  variables  in  the  partic- 
ular model. 

The  cards  labeled  TRNX  10  through  TRNX  110  are  always  required.     In  the  example 
below,  the  raw  variables  are  used  untransformed  with  a  constant  weight  of  one. 


SUBROUTINE  TRNX 


TRNX 
TRNX 
TRNX 
TRNX 
TRNX 
TRNX 
TRNX 
TRNX 
TRNX 


10 
20 
30 
40 
50 
60 
70 
80 
90 


C 

C  SUBROUTINE  TRNX  ALLOWS  NVAR  RAW  INPUT  VARIABLES  D(I)  TO  BE 
C  TRANSFORMED  INTO  (NGX+1)  REGRESSION  VARIABLES  USING  FORTRAN 
C  IF  WEIGHTED  REGRESSION,  W  MUST  BE  ASSIGNED  A  VALUE 
C 


COMMON/ RMT/X 
COMMON/MT/D,W 
DIMENSION    D(SO),  X(50) 
DO  1     1=1,  NVAR 
X(I)  =  D(I) 


1  CONTINUE 
W  =  1.0 
RETURN 
END 


TRNX 
TRNX 


100 
110 
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APPLICATIONS  OF  BREX 


To  demonstrate  the  use  of  BREX,  we  chose  two  data  sets,  the  Gorman-Toman  10  var- 
iable set  (Hoerl  and  Kennard  1970b,  p.  71)  and  the  Hald  4  variable  set  (Draper  and 
Smith  1966,  p.  366).     The  former  has  been  used  in  a  number  of  papers  on  ridge  regression 
(Mallows  1973,  Marquardt  and  Snee  1975,  Obenchain  1975,  and  Vinod  1976),  and  the  latter 
is  an  excellent  example  of  severe  multicollinearity .     The  computer  output  generated  by 
these  example  data  sets  is  found  in  appendix  B. 

Analysis  of  the  Gorman-Toman  Data 

Since  this  data  set  was  presented  in  the  form  of  a  correlation  matrix,  we  used  it 
directly  as  a  moment  matrix  punched  in  REX  format.     The  normal  coefficients  are  iden- 
tical to  the  standardized  coefficients  because  a  correlation  matrix  was  used  as  input. 
The  regression  was  specified  as  unweighted  and  through  the  mean  with  matrix  input  from 
tape  10  (see  sample  output,  p.  13). 

For  the  least-squares  solution  (see  sample  output,  p.  13),  the  maximum  VIF  was 
just  under  10,  so  one  would  expect  a  fairly  low  k  value.     Marquardt ' s  criterion  is 
satisfied  for  all  k  values  less  than  0.2  (see  p.   14-17)  but  we  would  select  k  =  0.04  as 
this  is  the  maximum  k  value  in  that  range  for  which  the  minimum  VIF  is  still  greater 
than  one.     This  ridged  solution  has  reduced  the  maximum  VIF  by  58  percent  with  only  a 
9  percent  increase  in  the  RSS  relative  to  least-squares.     Hoerl  and  Kennard  (1970b) 
selected  k  between  0.2  and  0.3  using  the  ridge  trace.     We  would  conclude  that  their 
approach  leads  to  overestimation  of  the  amount  of  bias  necessary  to  stabilize  the  sys- 
tem.    Note  that  the  default  selection  of  k  values  was  truncated  at  k  =  0.2  as  the  max- 
imum VIF  fell  under  1.0. 


Analysis  of  the  Hald  Data 

In  this  example,  the  raw  data  (13  observations)  were  read  from  cards  and  punched 
moment  matrix  output  was  requested  (see  sample  output,  p.  19).     The  data  were  weighted 
by  1.0,  to  show  the  change  in  labeling,  and  were  used  untransformed  by  TRNX. 

This  data  set  was  severely  multicollinear  as  indicated  by  a  maximum  VIF  of  282  for 
the  least-squares  solution  (see  p.  20).     Using  our  modified  Marquardt  criterion,  we 
selected  k  =  0.03  which  reduced  the  maximum  VIF  by  over  99  percent  with  a  30  percent 

increase  in  RSS.  R  ^decreased  less  than  2  percent.  The  determinant  of  X'X  has  increas- 
ed more  than  10  times  indicating  the  gain  in  stability. 

The  second  problem  set  (see  p.   24-25)  in  this  run  used  only  the  variables  1  and  2 
for  prediction.     The  data  matrix  from  the  first  problem  is  used  with  three  selected  k 
values,  0.11,  0.13,  and  0.15.     The  least-squares  solution  is  almost  orthogonal  with  both 
minimum  and  maximum  VIF's  barely  larger  than  one.     Ridging  is  obviously  unnecessary,  and 
those  solutions  are  truncated  after  the  first  two  k  values. 

Note  that  indicators  for  weighting  and  for  an  intercept  are  repeated  (see  p.  24) 
so  that  the  problem  is  correctly  labeled.     However,  the  indicators  have  no  effect  on 
the  solution. 
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APPENDIX  A 


Programing  Notes 

The  main  program,  BREX,  interprets  the  control  cards  for  each  problem  set.     If  raw 
data  are  input,  subroutine  MTRX  is  called  to  read  them  using  subroutine  TRNX  to  generate 
variables  for  the  regressions.     MTRX  also  produces  punched  output  of  the  moment  matrix 
in  vector  S,  identical  to  that  of  REX,  if  requested.     The  elements  of  S  are  corrected 
for  the  mean  if  IMO  =  0  and  the  weights  are  adjusted  to  sum  to  the  number  of  observations. 
BREX  then  creates  the  correlation  matrix  for  the  model  specified  on  control  card  3  from 
the  vector  S.     The  standardized  least- squares  solution  is  found  by  inverting  the  corre- 
lation matrix  using  subroutine  NVRT.     Gauss-Jordan  elimination  with  column  pivoting  to 
maximize  the  pivot  elements  is  the  technique  used.     The  inversion  is  performed  in  double 
precision  on  the  scaled  (correlation)  form  of  the  matrix  to  achieve  a  minimum  of  round- 
ing error  (Deegan  1976) .     As  a  check  for  numerical  singularity,  the  determinant  of 
X^X  is  output. 

The  normal  regression  coefficients  and  their  variances  are  obtained  from  the 
standardized  solution  (Draper  and  Smith  1966).     The  estimates  of  the  MSE,  ESS,  and  ad- 
justed R^  (R  ^)  are  all  based  on  the  normalized  solution. 

The  ridged  solutions  are  calculated  in  a  similar  manner,  first  adding  k  to  the 
diagonal  of  the  correlation  matrix.     Since  the  residuals  are  not  available  from  matrix 
input,  the  RSS  is  calculated  by 

RSS(k)  =  J'Y  -  (B*)'Z'y  -  /i(B*)'S*  (13) 

as  given  by  Hoerl  and  Kennard  (1970a) . 
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APPENDIX  B 


Card  Deck  Setup  and  Output  Generated  by  Example  Data  Sets 
Card  Deck  Setup  for  Gorman-Toman  Data  (CDC's  Scope  or  NOS  /  BE  Operating  Systems) 

CM60000  on  the  job  card  indicates  that,  at  most,  60,000  (octal)  words  of  memory 
are  required  to  run  this  program.     The  zeros  on  BREX  control  card  2  need  not  be  punched, 
as  a  blank  is  read  as  a  zero. 

JOBNAME,CM60000. 

ATTACH (TAPEl 0 , GTDATA, ID=MITCH) 

REWIND (TAPEl 0) 

FORTRAN. 

LGO. 

*EOR     (end  of  record  card) 

(followed  by  the  FORTRAN  source  deck  BREX  and  subroutines  MTRX  and  NVRT) 
SUBROUTINE  TRNX 
COMMON/ RMT/X 
COMMON/ MT/D,W 
DIMENSION    D(50),  X(50) 
RETURN 
END 
*EOR 

GORMAN  TOMAN    TEN  FACTOR  PROBLEM 
11      10        0        0        0      10      0  0        0  0 

XXXXXXXXXXY 
blank  card 
blank  card 
blank  card 
DONE 

*EOF     (end  of  file  card) 
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Card  Deck  Setup  for  Hald  Data  Using  a  Binary  Version  of  BREX 
(CDCs  or  NOS  /  BE  Operating  Systems) 


This  example  uses  a  binary  (compiled)  version  of  BREX  and  subroutines  NVRT  and 
MTRX,  which  reside  on  the  permanent  file  BINARYBREX.  A  source  deck  of  TRNX  is  used 
because  it  must  be  adjusted  to  suit  a  given  run. 

JOBNAME,  CM60000. 

ATTACH(BREX,  BINARYBREX, ID=MITCH) 

REWIND (BREX) 

FORTRAN. 

LOAD(LGO) 

BREX. 

*EOR     (end  of  record  card) 
SUBROUTINE  TRNX 
COMMON/ RMT/X 
COMMON/MT/D,W 
DIMENSION    D(50),  X(50) 
DO  1  J  =  1,5 
1  X(J)  =  D(J) 
W  =  1.0 
RETURN 
END 
*EOR 

DATA  FROM  HALD  GIVEN  BY  DRAPER  AND  SMITH  (1966)  P. 395 
5400001  13  57 

XXXXY 

blank  card 
(4F3. 0,F6. 1) 
blank  card 

7  26    6  60  78.5 

1  29  15  52  74.3 


10  68     8  12  109.4 

DATA  FROM  HALD  GIVEN  BY  DRAPER  AND  SMITH  (1966)   P. 375 
5231001  000 
XX  Y 

0.11        0.13  0.15 

blank  card 
blank  card 
DONE 

*EOF       (end  of  file  card) 
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1979.  BREX:  A  computer  program  for  applying  ridge  regression  techniques 
to  multiple  linear  regression.  USDA  For.  Serv.  Gen.  Tech.  Rep. 
INT-51,  25  p.    Intermt.  For.  and  Range  Exp.  Stn. ,  Ogden,  Utah  84401. 

An  algorithm  for  ridge  regression  (BREX)  is  presented  which  can  be  used  in 
conjunction  with  Grosenbaugh's  REX  (1967),  a  linear  regression  program  with 
combinatorial  screening.  The  algorithm  uses  either  REX  punched  matrix  input, 
or  raw  data  with  suitable  transformations,  to  estimate  ridged  (biased)  regres- 
sion coefficients. 
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